Simple Language Models for Spam Detection

نویسنده

Egidio L. Terra

چکیده

For this year’s Spam track we used classifiers based on language models. These models are used to compute the log-likelihood for each individual message and then classify them as either ham or spam. Different data sets were used to train these language models. Our approach is simple, we initially create simple unigram language models and smooth the probabilities of unseen tokens by means of the expected likelihood estimator with a small discount probability tuned in a training corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

Language Model Issues in Web Spam Detection

Language models have been widely used in the detection of spam pages in the web. However even though most of the experiments using language models to detect spam have got improved results, there exists several problems in the use of language models which affects the validity of the results. This paper points out the shortcomings of using language models specifically KL-Divergence and suggested ...

متن کامل

A New Model for Email Spam Detection using Hybrid of Magnetic Optimization Algorithm with Harmony Search Algorithm

Unfortunately, among internet services, users are faced with several unwanted messages that are not even related to their interests and scope, and they contain advertising or even malicious content. Spam email contains a huge collection of infected and malicious advertising emails that harms data destroying and stealing personal information for malicious purposes. In most cases, spam emails con...

متن کامل

Using Language Models for Spam Detection in Social Bookmarking

This paper describes our approach to the spam detection task of the 2008 ECML/PKDD Discovery Challenge. Our approach focuses on the use of language models and is based on the intuitive notion that similar users and posts tend to use the same language. We compare using language models at two different levels of granularity: at the level of individual posts, and at an aggregated level for each us...

متن کامل

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Simple Language Models for Spam Detection

نویسنده

چکیده

منابع مشابه

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Language Model Issues in Web Spam Detection

A New Model for Email Spam Detection using Hybrid of Magnetic Optimization Algorithm with Harmony Search Algorithm

Using Language Models for Spam Detection in Social Bookmarking

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

عنوان ژورنال:

اشتراک گذاری